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ABSTRACT 

Previous evidence indicates that a number of 
proteins are able to interact with cognate mRNAs. 
These autogenous associations represent important 
regulatory mechanisms that control gene expres- 
sion at the translational level. Using the catRAPID 
approach to predict the propensity of proteins to 
bind to RNA, we investigated the occurrence of au- 
togenous associations in the human proteome. Our 
algorithm correctly identified binding sites in well- 
known cases such as thymidylate synthase, tumor 
suppressor P53, synaptotagmin-1, serine/ariginine- 
rich splicing factor 2, heat shock 70 kDa, ribonucleic 
particle-specific U1A and ribosomal protein S13. In 
addition, we found that several other proteins are 
able to bind to their own mRNAs. A large-scale 
analysis of biological pathways revealed that aggre- 
gation-prone and structurally disordered proteins 
have the highest propensity to interact with 
cognate RNAs. These findings are substantiated by 
experimental evidence on amyloidogenic proteins 
such as TAR DNA-binding protein 43 and fragile X 
mental retardation protein. Among the amyloi- 
dogenic proteins, we predicted that Parkinson's 
disease-related a-synuclein is highly prone to 
interact with cognate transcripts, which suggests 
the existence of RNA-dependent factors in its 
function and dysfunction. Indeed, as aggregation 
is intrinsically concentration dependent, it is 
possible that autogenous interactions play a 
crucial role in controlling protein homeostasis. 



INTRODUCTION 

Although proteins are involved in almost every cellular 
process, increasing evidence indicates that coding and 
non-coding RNAs play fundamental roles in gene regula- 
tion (1,2) and disease (3,4). Recent studies showed that 
establishment of aberrant associations or disruption of 
functional protein-RNA interactions occurs in neuro- 
logical disorders (5,6). For instance, interaction with 
RNA favors conversion of alpha-helix rich prion protein 
PrP c into the pathogenic beta-structure-rich insoluble 
conformer PrP Sc that propagates in Creutzfeldt-Jakob 
disease (7). In Alzheimer's disease, the association 
between Amyloid Precursor Protein mRNA and 
iron regulatory protein 1 is disrupted, resulting in 
compromised translation efficiency and elevated cytotox- 
icity (8). 

Protein-RNA associations regulate several processes 
such as synthesis, folding, translocation, assembly and 
clearance of molecules. Previous studies suggested that 
ribonucleoprotein interactions might be able to facilitate 
protein and RNA folding (9,10). As a matter of fact, it has 
been observed that there is strong affinity between amino 
acids and their corresponding codons (11,12), which could 
imply a direct interaction between proteins and their own 
mRNAs (13,14). Indeed, TAR DNA-binding protein 43 
(TDP-43) and Fragile X Mental Retardation protein 
(FMRP) have been found to interact with their own 
mRNAs (15,16). In these cases, expression is regulated 
by a negative feedback loop involving the 3' untranslated 
region (UTR). Other autogenous associations have been 
observed in proteins associated with cell proliferation and 
gene expression (17,18). Also structurally disordered 
proteins such as Serine/Arginine-rich splicing factor 2 
(SRSF2) (19) as well as heterogeneous ribonucleoprotein 
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members (20,21) are able to inhibit their translation by 
associating with their own mRNAs. 

How often do autogenous associations occur in the 
human proteome? Recent technological advances 
revealed that a large number of proteins have RNA- 
binding abilities (22), which suggests that interaction 
with cognate mRNAs could be more frequent than previ- 
ously thought. Are autogenous associations linked to 
specific functions? It is possible that autoregulatory mech- 
anisms are involved in controlling protein production. For 
instance, in the case of TDP-43 and FMRP, inhibition of 
expression via autogenous interaction is a way to preserve 
protein functionality (15,16). Overexpression leads to high 
protein production and enhanced amyloidogenicity, re- 
sulting in harmful gain- or loss-of-function effects on 
cellular metabolism (23). 

In this work, we focused on the ability of proteins to 
establish autogenous associations. Using our computa- 
tional approach catRAPID (24), we studied the occur- 
rence of these interactions in the human proteome. A 
large-scale analysis was performed to identify the role of 
autogenous associations in biological pathways and char- 
acterize their properties. 



MATERIALS AND METHODS 

Biological pathway annotations 

We downloaded (September 2012) pathway data from two 
manually curated and high-quality resources: Reactome 
(25) and the NCI Pathway Interaction Database (NCI- 
PID) (26). The Reactome annotations (version 41) were 
gathered via the BioMart query interface returning a 
list of 167 canonical pathways containing 5375 unique 
protein coding genes, whereas the NCI-PID pathways 
were fetched directly from the database website (241 
pathways, 2053 unique protein coding genes). In both 
cases, UniprotKB (27) accession numbers were con- 
verted to Ensembl (version 68) gene identifiers using the 
UniprotKB id-mapping file (version 2012_08). 
Subsequently, the gene pathway annotations were 
transferred to the corresponding polypeptides and 
coding/non-coding transcripts. 

Protein-RNA interaction prediction 

We used the ca/RAPID algorithm (24) to predict inter- 
action propensity among all peptides and transcripts be- 
longing to Reactome and NCI-PID pathways. ca/RAPID 
was trained on a large set of protein-RNA pairs available 
in the Protein Data Bank to discriminate interacting and 
non-interacting molecules using secondary structure 
propensities, hydrogen bonding and van der Waals con- 
tributions (28). The method was tested on the non-nucleic 
acid-binding database (NNBP; area under the receiver 
operating characteristic curve of 0.92), the NPInter 
database (area under the receiver operating characteristic 
curve of 0.88) and a number of individual interactions 
(e.g. RNAse mitochondrial RNA MRP and X-inactive 
specific transcript XIST networks; average accuracy of 
78%). Owing to CPU limitations in the calculation (29), 
we restricted the predictions to RNA sequences with a 



length between 50 and 1500 nt as well as to polypeptides 
with a length between 50 and 750 amino acids. The 
'fragment' and 'strength' algorithms were used to 
identify regions involved in the binding and compute the 
specificity with respect to random protein-RNA associ- 
ations (5,29). For each protein-RNA pair under investi- 
gation, a reference set of 10 2 protein and 10 2 RNA 
molecules (total of 10 4 interactions) was used as a 
control (29). Reference sequences have the same lengths 
as the pair of interest to guarantee that the measure is 
independent of protein and RNA lengths (29). 

Gene partition 

The gene partition function gp(it,P) depends on the inter- 
action propensity it and type of protein-RNA association, 
which is defined as autogenous (P = a), intra- (P = i) or 
inter-pathway (P — I): 



gp(n,P) = 



c(n,P) 



c(n,a) + c(n,i) + c(n,I) 



(1) 



The number of counts c{n,P) is the fraction n(p,r,n) 
of protein (p) and RNA (r) molecules with interaction 
propensity higher than jt: 



A(P) 



n(p,r,n) 
N(p,r) 



(2) 



The function N{p,r) is the total number of interactions 
in the autogenous, intra- or inter-pathways A(P) 

Disorder propensity 

We predicted disorder propensities using the IUPred algo- 
rithm (30) with the 'long disorder' prediction option. We 
defined a residue as disorder prone if its IUPred score was 
above 0.5, as in a previous experimental study (31). 

Comparison of the propensity distributions 

We analyzed the disorder propensity of proteins involved 
in autogenous interactions by comparing them with the 
distributions of all the proteins annotated in the respective 
pathway data set. We evaluated the statistical significance 
using the Kolmogorov-Smirnov test (two-sided, 
alpha = 0.05). 

Pathway enrichment analysis 

We assessed the enrichment of autogenous interactions in 
biological pathways using the Gene Set Enrichment 
Analysis (GSEA) method (32). For each pathway data 
set, we used as background the whole list of autogenous 
interactions predicted by the ca/RAPID algorithm. We 
tested only those pathways annotated with autogenous 
interactions and containing at least five and not >500 
genes (Supplementary Tables S 14). We ran GSEA with 
default parameters and performing 1000 permutations. 

Protein and RNA abundances 

Protein abundances were retrieved from the integrated 
whole organism Human PeptideAtlas (33,34), as 
assembled in http://pax-db.org (versions 2009, 2010, 
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2011 and 2012). Spectral counting from mass spectrom- 
etry was normalized to overall abundance and expressed 
as p. p.m. (parts per million) (35). RNA levels were taken 
from Gencode version 7 (36,37) averaging non-zero abun- 
dances in all available tissues. Transcript abundance, 
expressed in p. p.m., was estimated from RNA-seq experi- 
ments by normalizing the counts to total number of reads 
(36,38). For the calculation of protein-RNA interactions, 
2487 proteins were combined with their RNAs (8976 tran- 
scripts considering the isoforms). 



RESULTS 

Protein-RNA interactions in biological pathways 

Using the co/RAPID method (5,29), we systematically 
investigated the role of autogenous interactions in biolo- 
gical networks. For this purpose, we collected protein and 
RNA sequences annotated in two pathway resources: 
Reactome (25) and NCI-PID (26). 

We first computed the interaction potential of 295 x 10 6 
protein-RNA pairs (10 376 protein sequences against 
28 493 RNA sequences) in Reactome and 65 x 10 6 
protein-RNA pairs (4754 protein sequences against 
13 608 RNA sequences) in NCI-PID (Supplementary 
Table SI; 'Materials and Methods' section). We then clas- 
sified interactions as follows (Figure la): (i) intra-pathway 
or between proteins and RNAs coded by different genes 
belonging to the same pathway; (ii) inter-pathways or 
between proteins and RNAs coded by different genes be- 
longing to different non-overlapping pathways; and (hi) 
autogenous or between proteins and RNAs coded by the 
same genes. To quantify the proportion of genes that are 
preferentially involved in intra-, inter-pathways or au- 
togenous interactions, we introduced the 'gene partition' 



function, which is the fraction of associations predicted at 
a certain interaction score (Figure lb; 'Materials and 
Methods' section). 

In both Reactome and NCI-PID (Figure lb; 
Supplementary Figure SI), we found that intra-pathway 
and inter-pathway interactions are strongly depleted at 
high interaction propensities, whereas autogenous associ- 
ations are enriched (Figure lb; 'Materials and Methods' 
section). We observed the same trend for both coding and 
non-coding transcripts (Supplementary Figure SI). 

Biological pathways enriched in autogenous interactions 

To uncover biological processes in which autogenous 
interactions play a functional role, we performed a 
GSEA (32) on both Reactome and NCI-PID pathways. 
In Reactome, we found 10 pathways enriched and 4 
pathways depleted in autogenous interactions with a 
false discovery rate <5% (Supplementary Table S2). The 
top enriched pathway is 'Amyloids' (q-value = 3.2 x 10~ 3 ) 
followed by 'Base Excision Repair' (q-value = 3.8 x 10~ 3 ) 
and 'Amine compound SLC transporters' 
(q-value = 4.4 x 10" 3 ) (Supplementary Table S2). 
Similarly, we identified 13 NCI-PID enriched pathways 
in autogenous interactions (Supplementary Table S3). 
We did not find any significantly depleted pathway at 
false discovery rate <5%. The top enriched NCI-PID 
pathways are 'Signaling events mediated by HDAC 
Class III' (q-value = 2.9 x 10" 2 ), 'C-MYC pathway' 
(q-value = 3.0 x 10~ 2 ) and 'Hypoxic and oxygen homeo- 
stasis regulation of HIF-1 -alpha' (q-value = 3.1 x 10~ 2 ) 
(Supplementary Table S3). We identified the 'Botulinum 
neurotoxicity'/'Effect of Botulinum toxin' pathway 
enriched in both databases (Reactome, 
q-value = 1.1 x 10" 2 ; NCI-PID, q-value = 3.5 x 10" 2 ) as 
well as the 'a-synuclein signaling' pathway enriched in 
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Figure 1. Autogenous versus intra- and inter-pathways interactions, (a) Sketch of biological pathways (pink and gray boxes connected by arrows; 
Supplementary Table SI). For each pathway, we studied three types of protein-RNA interactions: autogenous (green line), intra- (orange line) and 
inter-pathway (blue line); (b) From low to high interaction propensities (24), we found that the autogenous associations dominate over intra- and 
inter-pathway interactions present in Reactome (statistics for coding genes is shown; mean and s.e.m. of bins are shown; Supplementary Figure SI) 
(25). The gene partition is defined as the total fraction of genes showing preferential enrichment for autogenous, intra- or inter-pathway interactions 
(propensity >50: number n of genes enriched in autogenous interactions = 1238 of 1704; propensity > 100: n = 211 of 242; propensity > 150: n = 20 
of 20; Supplementary Figure S2; 'Materials and Methods' section). 
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NCI-PID (q-value = 4.2 x 10" 2 ) and related to the 
Reactome 'Amyloids' pathway (Supplementary Table S3). 

Known autogenous interactions in biological pathways 

To assess whether known cases of autogenous interactions 
were linked to the pathways identified in our analysis, we 
carried out a literature search. Indeed, amyloidogenic 
proteins such as TDP-43 and FMRP (Reactome 
pathway 'Amyloids') have strong propensities to bind to 
their mRNAs and have been discussed in our previous 
work (5). 

We found that tumor suppressor p53, involved in the 
two top-enriched pathways 'Signaling events mediated by 
HDAC Class III' and 'Hypoxic and oxygen homeostasis 
regulation of HIF-1 -alpha' (both in NCI-PID), is able to 
bind to its own mRNA (17,39,40). In Mus musculus, the 
RNA-binding site of p53 is a stable stem-loop structure 
that involves the 5' UTR plus a region of 280 nucleotides 



in the coding sequence (5' terminal region) (17). A similar 
mechanism has been observed in Homo Sapiens, but no 
conclusive evidence has been reported on the interaction 
(39). In agreement with experimental evidence on murine 
p53, our predictions indicated that the 5' terminal region 
has strong propensity to bind (Figure 2a), and the inter- 
action is specific (interaction strength = 89%) with respect 
to a control set of molecules of same size (Figure 2b; 
'Materials and Methods' section) (17). We found similar 
results for human p53, although the region involved in the 
binding is not known (Supplementary Figure S2a and b). 
It is worth mentioning that p53 has strong propensity to 
form amyloid fibrils (42), and interaction with nucleic 
acids represents a way to control its aggregation potential 
by limiting the amount of protein product (17,43). 
Moreover, we note that p53 can be associated with 'Base 
Excision Repair' pathway (Reactome) (44,45). Indeed, as 
'Base Excision Repair' deficiency affects genome stability 
and is implicated in many human diseases, including 
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Figure 2. The autogenous interaction of tumor suppressor p53. (a) Using the ra/RAPID algorithm (5,29), we were able to reproduce experimental 
evidence on the autogenous interaction of tumor suppressor p53 (17). The binding site is located in the 5' terminal region of the mRNA (the gray box 
marks the region observed experimentally) (17). The interaction specificity between p53 and the 5' terminal region of its mRNA was predicted to be 
significantly high (89%) with respect to a control set of protein-RNA associations ('Materials and Methods' section). The DNA binding domain and 
the disordered regions are reported as indicated in a recent study (41). 
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premature aging neurodegeneration and cancer (46,47), 
we expect that a self-regulatory mechanism in its compo- 
nents could greatly contribute to system's efficiency. 

In both NCI-PID and Reactome, we found that the 
pathways 'Botulinum neurotoxicity' and 'Effect of 
Botulinum toxin' are significantly enriched in autogenous 
interactions. One of the key-players in Botulinum toxicity 
is synaptotagmin-1, which is essential in Ca(2+)-depend- 
ent neurotransmitter release (48). Synaptotagmin-1 inter- 
acts with the 3' UTR of its own mRNA (49). In agreement 
with in vitro experiments, our predictions indicated that 
the 3' UTR of synaptotagmin-1 RNA is involved in 
the autogenous interaction (Supplementary Figure S3a 
and b). Importantly, translation of synaptotagmin-1 is 
downregulated by the 3' UTR, which is compatible with 
a negative feedback mechanism (49). As reported by light- 
scattering assays and electron microscopy, the protein 
forms large aggregates in a calcium-dependent manner 
(48), suggesting that the autogenous interaction protects 
against production of toxic oligomers (49). 

Thymidylate synthase catalyzes the reaction generating 
thymidine monophosphate, which is phosphorylated to 
thymidine triphosphate for use in DNA synthesis and 
repair. Thymidylate synthase forms a ribonucleoprotein 
complex with C-MYC mRNA (50) and interacts with its 
own mRNA (51). The protein is not reported in the 
'C-MYC pathway' (NCI-PID), but solid evidence exists 
on its interaction with C-MYC network (52). The RNA 
binding site for thymidylate synthase is within the 5' UTR 
of the transcript (52). co/RAPID correctly located the 
interaction within the first 188 nt of the 5' UTR 
(Supplementary Figure S4a) and predicted high specifi- 
city for the binding (interaction strength = 99%; 
Supplementary Figure S4b). We note that the bacterial 
homologue of thymidylate synthase is able to associate 
with its cognate mRNA (53), which highlights the 
crucial role of autogenous interactions across different 
species. 

Autogenous interactions and structural disorder 

A recent study (31) showed that many RNA-binding 
proteins contain intrinsically disordered regions. Using 
the IUPred algorithm (30), we investigated the role of 
structural disorder in biological pathways. We found 
that proteins involved in autogenous interactions have a 
significant higher fraction of disorder prone residues 
compared with all proteins annotated in Reactome and 
NCI-PID (Figure 3 and Supplementary Figure S5; same 
results were observed for both coding and non-coding 
RNAs). To investigate whether known cases of autogen- 
ous interactions are enriched in unstructured regions, we 
performed a literature search. 

We found that the human SRSF2, which has a long 
disordered C-terminus spanning amino acids 117-221, 
interacts with its own transcript (19) (Figure 4a). 
Notably, catRAPID correctly identified the binding site 
between the RRM domain (amino acids 14-92) and 
region I/II of the terminal exon (55) (Figure 4b; inter- 
action strength = 84%). The disordered region was pre- 
dicted to be not interacting with the terminal exon. As a 
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Figure 3. Structural disorder of proteins involved in autogenous inter- 
actions, (a) From low to high protein-RNA interaction propensities 
(24), we observed an increase in percentage of disorder residues (30), 
which is in agreement with previous experimental evidence (31) 
(Supplementary Figure S5). (b) The average and median values for 
the percentage of disordered residues are reported at different inter- 
action propensities (Reactome database; propensity >50: number n of 
proteins = 1521; propensity > 100: n = 353; propensity >150: n = 57). 
The statistical significance was assessed with the Kolmogorov-Smirnov 
test (KS test). 



matter of fact, the C-terminal region participates in 
processes that only indirectly relate to the RNA-binding 
activity of the protein: facilitation of Ser/Arg phosphoryl- 
ation to allow entrance in the nucleus (56) and establish- 
ment of low-affinity interactions to enhance the splicing 
activity (57). 

Human Heat Shock 70kDa (HSP70) interacts with its 
own mRNA (58) and has a disordered C-terminal region 
of ~10kDa [highly conserved across species (59)] contain- 
ing the Glu-Glu-Val-Asp regulatory motif. Using 
ca/RAPID, we predicted that the binding occurs at the 
3' UTR, which is in agreement with previous observations 
(Supplementary Figure S6a) (60). Indeed, HSP70 has a 
strong tendency to bind to AU-rich sequences that are 
located at the 3' UTR (Supplementary Figure S6b) (61). 
Importantly, we predicted that both the N-terminal 
ATPase domain and the disordered C-terminus are 
involved in the interaction, which is consistent with the 
observation that HSP70 RNA-binding affinity depends 
on the ATPase domain but is considerably reduced 
when the disordered region is removed (62) 
(Supplementary Figure S6c). A high concentration of 
HSP70 is toxic, as the protein has a strong tendency to 
aggregate (58). Hence, interaction with mRNA represents 
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Figure 4. The autogenous interaction of SRSF2. (a) The catRAPID algorithm (5,29) was used to reproduce experimental evidence on the autogen- 
ous interaction of SRSF2 (19). The binding site is located in the final exon of SRSF2 mRNA (the blue box marks the region I/II that was determined 
experimentally) (19). (b) The interaction strength between SRSF2 and region I/II is significantly specific (84%) with respect to a control set of 
protein-RNA associations ('Materials and Methods' section). The RNA-binding domain RRM and disorder regions were reported as indicated in a 
previous study (54) and in agreement with the Uniprot (27) annotation (entry Q01130). 



a mechanism to control protein production and formation 
of toxic aggregates. 

The small nuclear RiboNucleic Particle-specific U1A 
protein also binds to its own transcript with high affinity 
and specificity (18). Human U1A protein comprises two 
RNP domains separated by a disordered linker containing 
a nuclear localization signal. The determinants of protein- 
RNA specificity reside within amino acids 1-102 of the 
first RNP domain and a region in the linker (63). Using 
catRAPID, we predicted interactions between the 3' UTR 
and the first RNP domain as well as a region of the dis- 
ordered linker (Supplementary Figure S7a and b; protein 
domains take from UniProtKB entry P09012) (18). We 
also predicted interaction within the second RNP 
domain, which has not been reported in literature but is 
involved in pre-mRNAs recognition (64). We note that 
unstructured regions of the protein participate in base rec- 
ognition by forming direct and water-mediated hydrogen 
bonds with RNA (65). 

Also ribosomal protein SI 3 shows moderate presence of 
secondary structure (66) and is able to bind to its own 



mRNA (67). Indeed, even though no information is avail- 
able on its native state, it is possible that S13 contains 
disordered regions as most ribosomal proteins (68,69). 
In agreement with experimental evidence, catRAPID 
identified SI 3 binding site within the first and second 
exon (Supplementary Figure S8a and b) (67). 

We note that also p53 and synaptotagmin-1 contain 
disordered regions, as shown by previous studies (41,70). 
In the calculations of autogenous interactions involving 
disordered proteins, we used protein and RNA sequences 
as reported in the original papers. As the length of the 
RNA sequences exceeded catRAPID size limitation, the 
'fragmentation' algorithm (5,29) was used to identify 
regions involved in the binding ('Materials and 
Methods' section). 

Autogenous interactions in control of protein translation 

Autogenous interactions regulate gene expression at the 
translational level by controlling protein concentration. 
If protein concentration is high, binding to mRNA is 
expected to have a major effect on translation efficiency. 
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Figure 5. Autogenous interactions in control of protein translation. 
From low to high expression levels, increase in autogenous interaction 
propensity (black bars) is associated with reduced translation efficiency 
[(protein)/(RNA) < I], which is compatible with a negative feedback 
loop mechanism. When the propensity for autogenous interaction is 
low (green bars), translation efficiency is high [(protein)/(RNA) >1], 
suggesting that the feedback loop mechanism is not active. Protein 
and RNA abundances were taken from PeptideAtlas (33,34) and 
GenCode version 7 (37) (error bars are standard deviations on inter- 
action propensities; p.d.u. is procedure defined unit; p.p.m. is parts per 
million). 



We investigated the relationship between protein abun- 
dance and propensity for autogenous interactions. From 
low to high expression levels, we found that reduction in 
translation efficiency (protein to RNA abundance ratio 
<1) is accompanied with an increase in autogenous inter- 
action propensity (Figure 5), which is fully compatible 
with a negative feedback loop mechanism. If the 
negative feedback loop is active, translation slows down, 
and the ratio between protein and RNA abundance de- 
creases. By contrast, in case of efficient translation 
(protein to RNA abundance ratio >1), the propensity 
for autogenous interactions is low (Figure 5), suggesting 
that the feedback loop does not occur. 

Hence, our findings indicate that autogenous inter- 
actions might be involved in the modulation of protein 
expression by inducing regulatory feedback loops at the 
translational level. According to our analysis and in agree- 
ment with previous reports (15,49,58), negative feedback 
loops are a common type of mechanism involving 
autogenous interactions (for a list of predictions, see 
Supplementary Table S4). 

A hypothesis on alpha-synuclein association 

Among the amyloid proteins, we predicted that a-synuclein 
(SNCA gene) is significantly prone to autogenous inter- 
actions. The average interaction propensity with SNCA 
transcripts = 68 ± 4, and P = 1.2 x 10~ 12 (the average 
interaction propensity with transcripts coding for the 
major protein isoform = 72 ± 5 and P = 1.3 x 10~ 8 ; 
^-values were calculated with Mann-Whitney U test). 
The most abundant isoforms ENST00000394986, 
ENST00000336904 and ENST00000394991 (36,37) have 



interaction propensities of 81, 71 and 61, respectively. 
Intriguingly, the transcripts coding for the major protein 
isoform have a protein to RNA abundance ratio of 0.58, 
which is compatible with a negative feedback mechanism. 
The protein has been found to be present in both the cyto- 
plasm and nucleus (71,72), 

Alpha-synuclein is a 14 kDa protein composed of an 
amphipathic, positively charged 100 residue N-terminal 
domain with a lysine-rich N-terminus that binds reversibly 
to anionic membranes (73) and a 40-residue acidic 
C-terminal domain (74). The protein is predominantly 
monomeric in solution with a smaller fraction of 
multimeric species and is intrinsically unstructured 
(75,76). Importantly, interactions with double- or single- 
stranded DNA are able to convert a-synuclein into a 
highly structured protein (77). Circular dichroism shows 
that the a-helical content increases from 5 to 64% upon 
binding to DNA, whereas the random coil decreases from 
95 to 33% (77). 

The fact that a-synuclein interacts with DNA suggests 
that nucleic acid interactions might be relevant for its regu- 
lation. Our calculations indicate that ENST00000394986, 
ENST00000336904 and ENST00000394991 bind to 
a-synuclein at the 5' UTRs (Figure 6a and Supplementary 
Figure S9). We predicted that the 5' UTR interaction is 
specific (the interaction strengths of ENST00000394986, 
ENST00000336904 and ENST00000394991 are 100, 95 
and 99%, respectively; see Figure 6b) and within GC-rich 
regions (Supplementary Figure S10), in agreement with 
previous evidence showing a-synuclein preference for G 
and C nucleotides (77). Moreover, a lysine-rich region 
spanning residues 40-60 was predicted by cafRAPID to 
be involved in RNA recognition, which is consistent with 
previous results indicating an anion binding ability of the 
N-terminus (72). 

At present, it is unknown whether RNA associations 
protect a-synuclein against formation of toxic aggregates. 
As a matter of fact, interaction with DNA sensibly in- 
creases a-synuclein amyloidogenicity (78) and a study on 
the Hofmeister series showed that anion binding promotes 
fibrillization (79). As GC-rich DNA ap tamers have been 
found to associate with both monomeric and oligomeric 
a-synuclein (80), it is likely that stable RNA secondary 
structures, enriched in GC content, facilitate the 
disorder-to-order transition of a-synuclein, which could 
result in production of partially folded and highly 
amyloidogenic intermediates. 



DISCUSSION 

In this work, we used the cazRAPID method (24) to 
compute the propensity of proteins to interact with 
coding and non-coding transcripts. In a number of biolo- 
gical pathways annotated in Reactome and NCI-PID, we 
found enrichment for autogenous interactions. 

Our results are in agreement with available experimen- 
tal evidence on the amyloidogenic TDP-43 and FMRP 
(the 'Amyloids' pathway has the highest enrichment in 
autogenous interactions) (15,16), tumor suppressor p53 
('Signaling events mediated by HDAC Class III', 
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Figure 6. A hypothesis on the autogenous association of a-synuclein. Among the amyloid proteins, we found that a-synuclein (SNCA gene) is 
significantly prone to autogenous interactions (average interaction propensity with SNCA transcripts = 68 ± 4). (a) Our calculations indicated that 
transcript ENST00000394986, which is abundant in brain, binds to a-synuclein at the 5' UTR (gray box), (b) The 5' UTR association was predicted 
to be specific (interaction strength = 100%) and involving a GC-rich region (Supplementary Figure S9), in agreement with previous experimental 
evidence showing that ot-synuclein binds to G and C nucleotides (77). 



'Hypoxic and oxygen homeostasis regulation of HIF-1- 
alpha' and 'Base Excision Repair') (39), synaptotagmin-1 
('Botulinum neurotoxicity' and 'Effect of Botulinum 
toxin') (49) and thymidylate synthase (connected to 
'C-MYC pathway') (51). We expect that other autogenous 
interactions will occur in these pathways, although few 
cases have been reported in literature. For instance, the 
RNA-binding chaperone HSP90 (81,82) is involved in 
'Hypoxic and oxygen homeostasis regulation of HIF-1- 
alpha' pathway, and it might have the ability to associate 
with its own transcript as other heat shock proteins 
(58,83). Moreover, we found enrichment in pathways 
such as 'amine compound SLC transporters', 'metabolism 
of nitric oxide', 'iron uptake and transport', 'ABC-family 
proteins mediated transport', 'metabolism of vitamins and 



cofactors', 'amino acid transport', 'energy metabolism', 
where autogenous interactions could be playing a role in 
metabolic regulation (84). 

Our analysis showed that disordered proteins have sig- 
nificant propensity for autogenous interactions. The 
results are in agreement with experimental evidence on 
SRSF2 (19), HSP70 (58), U1A (18), p53 (17) and 
synaptotagmin-1 (49). The fact that proteins containing 
disordered regions have a high potential for autogenous 
interactions suggests that RNA interactions could protect 
unstructured domains from aberrant interactions or ag- 
gregation (85,86). Indeed, it has been observed that 
polyanionic molecules increase the solubility of nascent 
polypeptides, and that RNA molecules can act as 
molecular chaperones helping proteins to fold into their 
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native structures (87). As a matter of fact, interaction 
with cognate nucleic acids directly influences the 
aggregation propensity of TDP-43 (15,88), FMRP 
(89,90), p53 (17,43), synaptotagmin-1 (48,49) and HSP70 
(58,83). 

In the 'Amyloids' pathway, we focused on oc-synuclein, 
a highly disordered and amyloidogenic protein linked to 
pathogenetic processes such as Parkinson's disease and 
Lewy body dementia. Previous evidence indicated that 
a-synuclein forms partially folded multimers and aggre- 
gates impairing neuronal viability (75). We predicted that 
a-synuclein is able to establish autogenous interactions, 
which suggests the existence of RNA-dependent factors 
in the etiopathogenesis of Parkinson's disease and other 
synucleinopathies. Indeed, our findings indicate that 
a-synuclein solubility might be modulated in vivo by as- 
sociations with molecules such as nucleic acids (91,92). 
At present, it is has been shown that interaction 
with GC-rich DNA increases a-synuclein amyloi- 
dogenicity (77). 

Although a-synuclein is present at the presynaptic ter- 
minals of neurons (93), several studies show that it can 
localize in the 'nucleus' (94,95). It should be considered 
that some mRNAs are shuttled to the axonal periphery 
where local synthesis takes place (96). Hence, the 
a-synuclein localization pattern and the presence of 
mRNAs at the neuronal terminals suggest that the pre- 
dicted interaction of a-synuclein with RNA molecules 
may occur in the cellular context. 

As in the case of other ribonucleoprotein interactions 
(28,97), we expect that ancillary elements present in vivo 
could increase a-synuclein affinity for nucleic acid associ- 
ations. We do not exclude that the ribosomal components 
themselves could contribute to the formation of autogen- 
ous associations (10,98). As a matter of fact, the ribosome 
is the cellular component where autogenous interactions 
and translational control could take place simultaneously. 
Although our predictions on a-synuclein are compatible 
with a feedback loop mechanism (15,16), experimental 
evidence is required to determine the binding affinity of 
a-synuclein for RNA molecules and evaluate whether the 
interactions are mediated by other factors present in the 
cellular context. 

Our findings suggest that autogenous interactions are 
able to reduce protein expression by inducing a negative 
feedback loop at the translational level. We previously 
observed that a tight anti-correlation (97%) exists 
between mRNA expression levels and aggregation rates 
of proteins (99,100). This relationship suggests that an 
evolutionary pressure acts against formation of toxic 
aggregates (101), and a molecular mechanism is in 
place to control expression of amyloidogenic proteins 
(102). In the light of our new findings, it is possible 
to speculate that autogenous interactions directly 
reduce the aggregation potential of proteins by 
controlling expression via feedback loops. As a 
number of genes have been reported to be dosage- 
sensitive (86,103), it is tempting to hypothesize that au- 
togenous interactions play an important role in 
regulating their expression. 
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